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ABSTRACT 



Three sources of information useful in evaluating 
the adeguacy of reported research are discussed: articles, 
checklists, and rating scales. The chronological and genealogical 
relationships asong some of these sources, their general approach, 
aud some considerations for their use are indicated. A bibliography 
of 42 sources is provided. (DG) 
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A REVIEW OF INSTRUMENTS DEVELOPED 
TO BE USED IN THE EVALUATION OF 
THE ADEQUACY OF REPORTED RESEARCH 

by 

Bruce B. Bartos 

Phi Delta Kappa & Indiana University 



The three-fold purpose of this review will be to (1) provide a 
list of articles, checklists, and rating instruments, (2) show the 
chronological and genealogical relationships among some of the 
Instruments, and (3) Indicate *he general approaches of these 
various types of instruments along with some crucial concidera- 
tions for their use. 

Purpose (l) has been satisfied by the provision of the 38 
references on the appended list at the end of this paper. Of these, 
I have reproductions of 32. They have hern arranged in 
chronological rather than alphabetical order. 

Gephart, who If chairing this Symposium, compiled an 
extensive bibliography and instrument collection in the couise of 
completing his doctoral dissertation entitled, “The Development 
of an Instrument for Evaluating Research Reports, 1 ' The instru- 
ment therln, appropriately called the "Research Evaluation 
Instrument," was one of three minutely examined by Caroline S. 
Hodges, at the Bureau of Applied Social Research, in her Mister's 
Thetis. Hodges located still more efforts in this direction. It was 
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these combined references that formed the nucleus for and gave 
impetus to my endeavors. 

Chronologically, output in this area fallowed an interesting 
pattern from 1922 to 1967. In the 30 years from 1922*52 only 
six Instruments were developed. This was followed by double that 
for twelve) In the next 10 years, 1952 1962. This total of 18 
instruments In 40 years can now be compared with 20 developed 
In the next five years alone. At this rate, it seems reasonable, and 
frightening, to estimate that by 1970 20 more instrument 
development efforts will have required the energy of educational 
researchers. 



00 

ru 

o 

o 



Genealogically, things are not quite so clear-cut. Less than half 
of these studies have bibliographic references Included. Without 
these references a tracing of their antecedents becomes rather 
difficult. Of note, though, are the institutional Influences of The 
Ohio State UniverrUy and Columbia University, New York. The 
Ohio State University served as ba9e for Clark, Cuba, Kapfer, 
Cook, Gephart, Schneider, and Cady. Clark, Cuba, and Smith 
have beet, mutually influential, even to the extent that all three 
are now at Indiana University. Gephart, after moving to th* 
University of Wlsconsln-Milwaukee, produced papers jointly with 
Ingle and Remstad. Similarly, Columbia University has known 
Sexier, Symonds, Nasatir, Sieber, Hodges, and Joel and Jean 
Davitz. These two groups account for 19 or 50% of the studies 
found. / . 

When the documents are arranged by type, they divide into 
three categories: articles, checklists, and rating Instruments. The 
operational definition of these categories Knges on their manner 
of indicating the guidelines by which research reports should be 
evaluated. The articles are expository In nature, employing 
declarative sentences, and imbeding the guidelines In the body of 
the texb Checklists are columns of questions, usually subsumed 
under criteria! headings, and requiring only lhat one consider 
whether the question applies to some particular report. Instru- 
menu come equipped with multi-level, multi-faceted scales upon 
whkh one locates his answers to a string of questions or 
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statements. However, while the categories may seem to be neat, 
square boxes, the documents have round corners and bulging 
sides. 

Typical of the article type of approach is Fox (12) who lists 
seven “criteria” including the purpose of the research; the 
research procedures; the research design; the limitations of the 
design; the analysis of the data; the conclusions; and the 
experience cf the investigator. He closes his article with state- 
ments. to the effect that while this is not an extensive list, “...if, 
however, the criteria help to make readers more constructively 
critical of reported findings of research, they will have served 
their purpose.” Perdew (6) and Spence (19) follow a similar style. 
Perdew felt style of reporting should be a prime consideration, 
while Spence beaded his list with a cautionary note about 
investigator's credentials. 

It will be noted that earlier I put quotation marks around the 
word “criteria” with reference to Fox. This was because I feel 
almost none of the articles or Instruments uncovered in my 
investigations completely satisfy the definition for a criterion. A 
criterion is a standard, like an inch in the measurement of length, 
having a zero baseline, and composed of finite Increments. In lieu 
of the precise measurements of Physics, research report criteria 
should be arrived at through examples of consensually designated 
good and bad items. 

Gephart (25) and Ingle and Gephart ( 32 ) are examples of a 
more adequate way to present criteria. The GephArt article 
discusses over thirty distinct criteria for methodological ade* 
quacy. Using those criteria, Ingle and GephArt do a pretty fair job 
of consfructfvely critiquing a piece of research under the thres 
headings: The Hypothesis, the Evidence (data), the Inference 
Pattern Gogical structure). Hodges (30) also developed baselines 
for the comparison of her criteria. 

Fully a third of the documents referred to herein are 
checklists. Most are a series of questions artanged in table form 
though a few are positive statements. They range from 
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Gibboney*s (17) seven item gauge used to include or exclude 
research for a further review to Symonds* (9) 143 questions with 
which his Educational Psychology department reviewed disserta- 
tion proposals, ft is mis 1 ''ding, however, to leave you believing 
that Symonds has that many distinct elements when actually they 
are grouped under 13 separate headings. The majority of the lists 
use between S and 15 main htadings, and these headings tend to 
be the same ones Fox, Perdew, and Spence employ. 

The most unique approach can be seen in the hybrid 
“checkstrument" (my word) developed by Smith (IS). First, he 
assesser the inadequacies, not the attributes • p:ob?bly in the 
belief that, as with whole cloth, it is easier to spot the flaws than 
It is to praise the completed product. Second, he supplies a five 
item code which ranges from the “inapplicability” of the 
Inadequacy to “its presence is a serious flaw.” Next, Smith gives 
examples as aids to answer most of his 52 (and numerous sub-) 
questions. And finally, he, like Gephart (22) and Nasatir (18, 29), 
provides an overall evaluation question. 

Stephens (36), and to a lesser degree Smith (1 S), constructed a 
programed decision tree approach for their instruments. While 
Smith’s Is Implicit, the explicit flow chart draw by Stephens 
clarifies and unifies his checklist*! use. !n making the point that 
just as the most competent researcher may occasionally do a poor 
piece of research, so might the “dufier, by good luck, come up 
with a useful product,'* Stephens emphasizes outcomes rather 
than procedures and methodologies, fn this he differs from all his 
predecessors. 

Although almost any Instrument developed for the evaluation 
of computed research could likewise be used at the planning uid 
proposal stages, only five of the authors take care to point this 
out. If consumers of research are going to be encouraged to use 
the various aids mentioned In this paper, they should be fully 
appraised of the context of their development and possible 
alternative uses. Symonds (9) perpared hts guide for persons in 
educational psychology; Schneider and Cady (27) were concerned 
about musk: research; Kapfer * science education; and Suydtm - 
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primary school arithmetic. 
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The last differentiating set of characteristics I will mention 
today is the type of research for which the Instrument is 
Intended. Johnson (10) recommends his rating Instrument for 
both survey and experimental research, while Cook (?1) limits his 
to experimental research only. Gephart, Ingle, and Rems tad (33) 
analyzed comparative studies, and Perdew (6) felt that historical 
research should come under more methodical scrutiny. 

I hope it has become apparent that the evaluation of research, 
both ; i proposals and final reports, has had a lengthy and varied 
history. The search for and development of adequate Instruments 
must be continued. And interested members of AERA seem best 
suited to the task. 
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